Kriya - An end-to-end Hierarchical Phrase-based MT System
نویسندگان
چکیده
This paper describes Kriya – a new statistical machine translation (SMT) system that uses hierarchical phrases, whichwere first introduced in the Hieromachine translation system (Chiang, 2007). Kriya supports both a grammar extraction module for synchronous context-free grammars (SCFGs) and a CKY-based decoder. There are several re-implementations of Hiero in the machine translation community, but Kriya offers the following novel contributions: (a) Grammar extraction in Kriya supports extraction of the full set of Hiero-style SCFG rules but also supports the extraction of several types of compact rule sets which leads to faster decoding for different language pairs without compromising the BLEU scores. Kriya currently supports extraction of compact SCFGs such as grammars with one non-terminal and grammar pruning based on certain rule patterns, and (b) The Kriya decoder offers some unique improvements in the implementation of cube-pruning, such as increasing diversity in the target languagen-best output and novel methods for languagemodel (LM) integration. The Kriya decoder can take advantage of parallelization using a networked cluster. Kriya supports both KENLM and SRILM for language model queries. This paper also provides several experimental results which demonstrate that the translation quality of Kriya compares favourably to the Moses (Koehn et al., 2007) phrase-based system in several language pairs while showing a substantial improvement for Chinese-English similar to Chiang (2007). We also quantify themodel sizes for phrase-based and Hiero-style systems and also present experiments comparing variants of Hiero models.
منابع مشابه
Kriya - The SFU System for Translation Task at WMT-12
This paper describes our submissions for the WMT-12 translation task using Kriya our hierarchical phrase-based system. We submitted systems in French-English and English-Czech language pairs. In addition to the baseline system following the standard MT pipeline, we tried ensemble decoding for French-English. The ensemble decoding method improved the BLEU score by 0.4 points over the baseline in...
متن کاملGappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars
Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on ...
متن کاملThe RWTH Aachen German to English MT System for IWSLT 2015
This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2015. We participated in the MT and SLT tracks for the German→English language pair. We employ our state-of-the-art phrase-based and hierarchical phrase-based baseline systems for the MT track. ...
متن کاملApplication of Translation Knowledge Acquired by Hierarchical Phrase Alignment for Pattern-based MT
Hierarchical phrase alignment is a method for extracting equivalent phrases from bilingual sentences, even though they belong to different language families. The method automatically extracts transfer knowledge from about 125K English and Japanese bilingual sentences and then applies it to a pattern-based MT system. The translation quality is then evaluated. The knowledge needs to be cleaned, s...
متن کاملHallucinating Phrase Translations for Low Resource MT
We demonstrate that “hallucinating” phrasal translations can significantly improve the quality of machine translation in low resource conditions. Our hallucinated phrase tables consist of entries composed from multiple unigram translations drawn from the baseline phrase table and from translations that are induced from monolingual corpora. The hallucinated phrase table is very noisy. Its transl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Prague Bull. Math. Linguistics
دوره 97 شماره
صفحات -
تاریخ انتشار 2012